Probabilistic word sense disambiguation : Analysis and techniques for combining knowledge sources

نویسنده

  • Judita Preiss
چکیده

This thesis shows that probabilistic word sense disambiguation systems based on established statistical methods are strong competitors to current state-of-the-art word sense disambiguation (WSD) systems. We begin with a survey of approaches to WSD, and examine their performance in the systems submitted to the Senseval-2 WSD evaluation exercise. We discuss existing resources for WSD, and investigate the amount of training data needed for effective supervised WSD. We then present the design of a new probabilistic WSD system. The main feature of the design is that it combines multiple probabilistic modules using both DempsterShafer theory and Bayes Rule. Additionally, the use of Lidstone’s smoothing provides a uniform mechanism for weighting modules based on their accuracy, removing the need for an additional weighting scheme. Lastly, we evaluate our probabilistic WSD system using traditional evaluation methods, and introduce a novel task-based approach. When evaluated on the gold standard used in the Senseval-2 competition, the performance of our system lies between the first and second ranked WSD system submitted to the English all words task. Task-based evaluations are becoming more popular in natural language processing, being an absolute measure of a system’s performance on a given task. We present a new evaluation method based on subcategorization frame acquisition. Experiments with our probabilistic WSD system give an extremely high correlation between subcategorization frame acquisition performance and WSD performance, thus demonstrating the suitability of SCF acquisition as a WSD evaluation task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Sense Disambiguation using Optimised Combinations of Knowledge Sources

Word sense disambiguation algorithms, with few exceptions, have made use of only one lexical knowledge source. We describe a system which performs unrestricted word sense disambiguation (on all content words in free text) by combining different knowledge sources: semantic preferences, dictionary definitions and subject/domain codes along with part-of-speech tags. The usefulness of these sources...

متن کامل

Combining Weak Knowledge Sources for Sense Disambiguation

There has been a tradition of combining different knowledge sources in Artificial Intelligence research. We apply this methodology to word sense disambiguation (WSD), a long-standing problem in Computational Linguistics. We report on an implemented sense tagger which uses a machine readable dictionary to provide both a set of senses and associated forms of information on which to base disambigu...

متن کامل

Combining Supervised and Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation

This work combines a set of available techniques – which could be further extended – to perform noun sense disambiguation. We use several unsupervised techniques (Rigau et al., 1997) that draw knowledge from a variety of sources. In addition, we also apply a supervised technique in order to show that supervised and unsupervised methods can be combined to obtain better results. This paper tries ...

متن کامل

Combining Knowledge- and Corpus-based Word-Sense-Disambiguation Methods

In this paper we concentrate on the resolution of the lexical ambiguity that arises when a given word has several different meanings. This specific task is commonly referred to as word sense disambiguation (WSD). The task of WSD consists of assigning the correct sense to words using an electronic dictionary as the source of word definitions. We present two WSD methods based on two main methodol...

متن کامل

Combining Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation

This paper presents a method to combine a set of unsupervised algorithms that can accurately disambiguate word senses in a large, completely untagged corpus. Although most of the techniques for word sense resolution have been presented as stand-alone, it is our belief that full-fledged lexical ambiguity resolution should combine several information sources and techniques. The set of techniques ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006